-
Couldn't load subscription status.
- Fork 1.1k
SIP-72: dedented triple-quoted string literals #24185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
That was fast! |
|
Not ready to review yet! Still need a bit more vibing haha |
Ah right, I'll convert it to draft then |
|
|
||
| val hasTabs = closingIndent.contains('\t') | ||
| val hasSpaces = closingIndent.contains(' ') | ||
| if (hasTabs && hasSpaces) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be able to detect this in one loop
| else | ||
| literal(inTypeOrSingleton = true) | ||
|
|
||
| /** Dedent a string literal by removing common leading whitespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For new code in the compiler we use indentation syntax and new conditional if / then / else syntax. The old Java conditional syntax is already disabled under -language.future.
| val isDedented = | ||
| in.charOffset + 2 < in.buf.length && | ||
| in.buf(in.charOffset - 1) == '\'' && | ||
| in.buf(in.charOffset) == '\'' && | ||
| in.buf(in.charOffset + 1) == '\'' | ||
| in.nextToken() | ||
| def nextSegment(literalOffset: Offset) = | ||
| segmentBuf += Thicket( | ||
| literal(literalOffset, inPattern = inPattern, inStringInterpolation = true), | ||
| atSpan(in.offset) { | ||
| if (in.token == IDENTIFIER) | ||
| termIdent() | ||
| else if (in.token == USCORE && inPattern) { | ||
| in.nextToken() | ||
| Ident(nme.WILDCARD) | ||
| } | ||
| else if (in.token == THIS) { | ||
| in.nextToken() | ||
| This(EmptyTypeIdent) | ||
| } | ||
| else if (in.token == LBRACE) | ||
| if (inPattern) Block(Nil, inBraces(pattern())) | ||
| else expr() | ||
| else { | ||
| report.error(InterpolatedStringError(), source.atSpan(Span(in.offset))) | ||
| EmptyTree | ||
| } | ||
| }) | ||
|
|
||
| var offsetCorrection = if isTripleQuoted then 3 else 1 | ||
| while (in.token == STRINGPART) | ||
| nextSegment(in.offset + offsetCorrection) | ||
| // Collect all string parts and their offsets | ||
| val stringParts = new ListBuffer[(String, Offset)] | ||
| val interpolatedExprs = new ListBuffer[Tree] | ||
|
|
||
| var offsetCorrection = if (isDedented) 3 else if (isTripleQuoted) 3 else 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit is super sketchy, I'm sure there's a better way
|
Marking this as ready to review since the SIP is has been voted into experimental phase |
809656c to
3aefb59
Compare
|
The last failure turned out to be due to error message positioning issues, fixed that and it's all green now |
|
|
||
| val isDedented = | ||
| in.charOffset + 2 < in.buf.length && | ||
| in.buf(in.charOffset - 1) == '\'' && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we look at offset - 1 here? And we should be able to reuse the logic of isDedentedStringLiteral?
| * | ||
| * @param str The string content to dedent | ||
| * @param offset The source offset where the string literal begins | ||
| * @return The dedented string, or str if errors were reported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the importance of other parameters, it would be better to explain them as well, maybe with an example.
| } | ||
|
|
||
| /** Extract the closing indentation from the last line of a string */ | ||
| private def extractClosingIndent(str: String, offset: Offset): (String, Boolean) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use an Option[String] for the result.
| val linesAndWithSeps = (str.linesIterator.zip(str.linesWithSeparators)).toSeq | ||
| var lineOffset = offset | ||
| // start counting error location offsets only after opening delimiter | ||
| while(in.buf(lineOffset) == '\'') lineOffset += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing a space after while
| @@ -0,0 +1,288 @@ | |||
| // Test runtime behavior of dedented string literals | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the following small tests:
val x = s'''
${"content with\nnewline"}
more text
'''
val nested = s'''
outer ${'''
inner
'''}
'''
val nested2 = s'''
outer ${s'''
inner with $x
'''}
'''
Implements scala/improvement-proposals#112
The initial Lexing/Scanning is lenient, only looking for the opening
''''*and equivalent closing delimiter. This matches how we can expect this to be implemented in other tools that have more restricted lexing frameworks (IntelliJ w/ JFlex, VSCode w/ TextMate Grammars, NeoVim w/ TreeSitter)All other validation (opening delimiter must be followed by newline, closing delimiter must be preceded by whitespace only) and de-denting is left to the parsing phase, which is the only time we have a complete string "literal" when an interpolator is present, and thus are able to look at the trailing delimiter's preceding indent whitespace and trim it from all earlier
STRINGLIT/STRINGPARTtokensdef interpolatedStringneeded to be refactored to support dedenting: rather than constructing the trees immediately, we first assemble all the strings parts, then use the last string part to compute the dedent that we apply to all other parts, and only then do we construct the treesCovered by
neg/tests andrun/tests for all the major features and edge cases I could think of:sandf@compileTimeOnlyI haven't managed to reliably run tests for some reason, I think I'm bumping into https://contributors.scala-lang.org/t/current-testcompilation/7256. But I tested it manually by copy-pasting the run/neg test files into the
bin/scalaREPL and compared the output manually with the.checkfiles on disk, and the output is identical